NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Road map for the Democratization of Space-Based Communications

https://doi.org/10.1145/3696348.3696866

Muriga, Veronica; Kumar, Swarun; Sriraman, Akshitha; Gueye, Assane (November 2024, ACM)

Full Text Available
Designing Cloud Servers for Lower Carbon

https://doi.org/10.1109/ISCA59077.2024.00041

Wang, Jaylen; Berger, Daniel S; Kazhamiaka, Fiodar; Irvene, Celine; Zhang, Chaojie; Choukse, Esha; Frost, Kali; Fonseca, Rodrigo; Warrier, Brijesh; Bansal, Chetan; et al (June 2024, IEEE)

To mitigate climate change, we must reduce carbon emissions from hyperscale cloud computing. We find that cloud compute servers cause the majority of emissions in a general-purpose cloud. Thus, we motivate designing carbon-efficient compute server SKUs, or GreenSKUs, using recently-available low-carbon server components. To this end, we design and build three GreenSKUs using low-carbon components, such as energy-efficient CPUs, reused old DRAM via CXL, and reused old SSDs. We detail several challenges that limit GreenSKUs, carbon savings at scale and may prevent their adoption by cloud providers. To address these challenges, we develop a novel methodology and associated framework, GSF (GreenSKU Framework), that enables a cloud provider to systematically evaluate a GreenSKU’s carbon savings at scale. We implement GSF within Microsoft Azure’s production constraints to evaluate our three GreenSKUs’ carbon savings. Using GSF, we show that our most carbon-efficient GreenSKU reduces emissions per core by 28% compared to currently-deployed cloud servers. When designing GreenSKUs to meet applications’ performance requirements, we reduce emissions by 15%. When incorporating overall data center overheads, our GreenSKU reduces Azure’s net cloud emissions by 8%.
more » « less
Full Text Available
Towards Improved Power Management in Cloud GPUs

https://doi.org/10.1109/LCA.2023.3278652

Patel, Pratyush; Gong, Zibo; Rizvi, Syeda; Choukse, Esha; Misra, Pulkit; Anderson, Thomas; Sriraman, Akshitha (July 2023, IEEE Computer Architecture Letters)

As modern server GPUs are increasingly power intensive, better power management mechanisms can significantly reduce the power consumption, capital costs, and carbon emissions in large cloud datacenters. This letter uses diverse datacenter workloads to study the power management capabilities of modern GPUs. We find that current GPU management mechanisms have limited compatibility and monitoring support under cloud virtualization. They have sub-optimal, imprecise, and non-intuitive implementations of Dynamic Voltage and Frequency Scaling (DVFS) and power capping. Consequently, efficient GPU power management is not widely deployed in clouds today. To address these issues, we make actionable recommendations for GPU vendors and researchers.
more » « less
Full Text Available
Thermometer: profile-guided btb replacement for data center applications

https://doi.org/10.1145/3470496.3527430

Song, Shixin; Khan, Tanvir Ahmed; Shahri, Sara Mahdizadeh; Sriraman, Akshitha; Soundararajan, Niranjan K; Subramoney, Sreenivas; Jiménez, Daniel A.; Litz, Heiner; Kasikci, Baris (June 2022, International Symposium on Computer Architecture (ISCA))

Full Text Available
Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications

Khan, Tanvir Ahmed; Zhang, Dexin; Sriraman, Akshitha; Devietti, Joseph; Pokam, Gilles; Litz, Heiner; Kasikci, Baris (June 2021, Proceedings of the 48th International Symposium on Computer Architecture)
null (Ed.)
Modern data center applications exhibit deep software stacks, resulting in large instruction footprints that frequently cause instruction cache misses degrading performance, cost, and energy efficiency. Although numerous mechanisms have been proposed to mitigate instruction cache misses, they still fall short of ideal cache behavior, and furthermore, introduce significant hardware overheads. We first investigate why existing I-cache miss mitigation mechanisms achieve sub-optimal performance for data center applications. We find that widely-studied instruction prefetchers fall short due to wasteful prefetch-induced cache line evictions that are not handled by existing replacement policies. Existing replacement policies are unable to mitigate wasteful evictions since they lack complete knowledge of a data center application’s complex program behavior. To make existing replacement policies aware of these eviction-inducing program behaviors, we propose Ripple, a novel software-only technique that profiles programs and uses program context to inform the underlying replacement policy about efficient replacement decisions. Ripple carefully identifies program contexts that lead to I-cache misses and sparingly injects “cache line eviction” instructions in suitable program locations at link time. We evaluate Ripple using nine popular data center applications and demonstrate that Ripple enables any replacement policy to achieve speedup that is closer to that of an ideal I-cache. Specifically, Ripple achieves an average performance improvement of 1.6% (up to 2.13%) over prior work due to a mean 19% (up to 28.6%) I-cache miss reduction.
more » « less
Full Text Available
Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications

https://doi.org/10.1109/ISCA52012.2021.00063

Khan, Tanvir Ahmed; Zhang, Dexin; Sriraman, Akshitha; Devietti, Joseph; Pokam, Gilles; Litz, Heiner; Kasikci, Baris (June 2021, 48th Annual International Symposium on Computer Architecture (ISCA))
null (Ed.)
Full Text Available
Twig: Profile-Guided BTB Prefetching for Data Center Applications

https://doi.org/10.1145/3466752.3480124

Khan, Tanvir Ahmed; Brown, Nathan; Sriraman, Akshitha; Soundararajan, Niranjan K; Kumar, Rakesh; Devietti, Joseph; Subramoney, Sreenivas; Pokam, Gilles A; Litz, Heiner; Kasikci, Baris (October 2021, 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21))
null (Ed.)
Full Text Available
Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications

Khan, Tanvir Ahmed; Zhang, Dexin; Sriraman, Akshitha; Devietti, Joseph; Pokam, Gilles; Litz, Heiner; Kasikci, Baris. (January 2021, Proceedings of the 48th International Symposium on Computer Architecture)
null (Ed.)
Modern data center applications exhibit deep software stacks, resulting in large instruction footprints that frequently cause instruction cache misses degrading performance, cost, and energy efficiency. Although numerous mechanisms have been proposed to mitigate instruction cache misses, they still fall short of ideal cache behavior, and furthermore, introduce significant hardware overheads. We first investigate why existing I-cache miss mitigation mechanisms achieve sub-optimal performance for data center applications. We find that widely-studied instruction prefetchers fall short due to wasteful prefetch-induced cache line evictions that are not handled by existing replacement policies. Existing replacement policies are unable to mitigate wasteful evictions since they lack complete knowledge of a data center application’s complex program behavior. To make existing replacement policies aware of these eviction inducing program behaviors, we propose Ripple, a novel software-only technique that profiles programs and uses program context to inform the underlying replacement policy about efficient replacement decisions. Ripple carefully identifies program contexts that lead to I-cache misses and sparingly injects “cache line eviction” instructions in suitable program locations at link time. We evaluate Ripple using nine popular data center applications and demonstrate that Ripple enables any replacement policy to achieve speedup that is closer to that of an ideal I-cache. Specifically, Ripple achieves an average performance improvement of 1.6% (up to 2.13%) over prior work due to a mean 19% (up to 28.6%) I-cache miss reduction.
more » « less
Full Text Available
I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing

https://doi.org/10.1109/MICRO50266.2020.00024

Khan, Tanvir Ahmed; Sriraman, Akshitha; Devietti, Joseph; Pokam, Gilles; Litz, Heiner; Kasikci, Baris (October 2020, Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO))
null (Ed.)
Modern data center applications have rapidly expanding instruction footprints that lead to frequent instruction cache misses, increasing cost and degrading data center performance and energy efficiency. Mitigating instruction cache misses is challenging since existing techniques (1) require significant hardware modifications, (2) expect impractical on-chip storage, or (3) prefetch instructions based on inaccurate understanding of program miss behavior. To overcome these limitations, we first investigate the challenges of effective instruction prefetching. We then use insights derived from our investigation to develop I-SPY, a novel profile-driven prefetching technique. I-SPY uses dynamic miss profiles to drive an offline analysis of I-cache miss behavior, which it uses to inform prefetching decisions. Two key techniques underlie I-SPY's design: (1) conditional prefetching, which only prefetches instructions if the program context is known to lead to misses, and (2) prefetch coalescing, which merges multiple prefetches of non-contiguous cache lines into a single prefetch instruction. I-SPY exposes these techniques via a family of light-weight hardware code prefetch instructions. We study I-SPY in the context of nine data center applications and show that it provides an average of 15.5% (up to 45.9%) speedup and 95.9% (and up to 98.4%) reduction in instruction cache misses, outperforming the state-of-the-art prefetching technique by 22.5%. We show that I-SPY achieves performance improvements that are on average 90.5% of the performance of an ideal cache with no misses.
more » « less
Full Text Available

Search for: All records